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ABSTRACT 


This thesis presents a simple and efficient method of 
implementing a semantic type checking system for use with 
relational databases. Numeric data typically represent 
measures of a specific property or characteristic of a real 
world object.. Computers, manipulate only the numeric value. 
It is the responsibility of the user to ensure that the data 
are handled in a manner consistent with its meaning. If the 
semantics associated with the numbers are stored in a data 
dictionary, semantic consistency can be verified by the 
database system. This increases the integrity of data 
manipulation and helps ensure meaningful results. This thesis 
demonstrates a simple scheme of representing the property, or 
quantity, and unit of measure associated with numeric 
attributes. This information is then used to verify 
dimensional consistency of database queries and to 
automatically convert units across systems of measurement. 
Finally, a concept is defined for each relation in the 
database. These concepts can be used to build a concept 
hierarchy to help ensure queries are consistent with the 
semantics of the database design. 
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I. 


INTRODUCTION 


This thesis addresses the problem of adding semantic 
information to numeric-valued data attributes stored in a 
database. Numeric data normally are associated with a 
characteristic or dimension of some real world object or 
event. The data typically reference a specific system of 
measurement through a particular unit of measure. Database 
management systems (DBMS), however, typically store only the 
numeric value without reference to this measurement. The 
values are then manipulated without regard to their real-world 
meaning. For example, a value measured in miles might be 
added to one measured in pounds. It is currently the 
responsibility of the user to ensure that data are manipulated 
in a manner consistent with its meaning. We propose a method 
of storing the meaning or semantics of the numeric data along 
with the actual data values. This will allow the system to 
verify automatically the semantic consistency of data 
manipulation. 

A. PROBLEM DESCRIPTION 

In physics and engineering, as well as other applications, 
including units of measure along with numerical values has a 
long tradition. The numbers by themselves have arbitrary 









meaning. Only with a unit of measure is there a standard of 
comparison with which to determine the extent of something. 
Suppose X is measured in units of feet. We then know the 
value assigned to X represents the number of times one foot 
occurs in the object represented by X. Furthermore, we know 
X should not be added, subtracted, or compared with Y unless 
both are expressed in feet. (Clemence, 1987, p. 8-9) 

In addition to units of measure, numeric data are 
associated with some object or event. Specifically, the data 
represent a measure of some quantifiable property of that 
object or event. This semantic information ensures data are 
manipulated in a meaningful way. Apples should not be added 
to oranges and the weight of a car should be subtracted from 


its length. 

Verification of the semantic consistency of data 
manipulation is currently the responsibility of the user. The 
computer manipulates only the numbers. To verify semantic 
consistency, each numeric-valued symbol is replaced with its 
explanatory description. Two types of dimensional calculus 
are then applied. First, is the calculus of measurement 
units: units are multiplied or divided and an analysis is 
performed to ensure that the numbers being added, subtracted 
or compared ali have the same scale of reference. The other 
calculus is concerned with what the numbers represent ^nd the 
properties being measured. Again the calculus prescribes 
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certain rules for multiplication, division, addition, 
subtraction, and comparison. (Bradley and Clemence, 

1987, p. 404) 

The rules of semantic consistency also apply to the 
relational database environment. The relational model 
provides a powerful data access and manipulation capability. 
Data are stored in two dimensional tables, a structure easily 
understood by most users. Many updates and queries are 
accomplished by combining, or joining, multiple tables in 
various ways. This provides an extremely powerful access 
capability, particularly for ad hoc queries. 

This ease of access and unlimited data manipulation 
capability does not come free. The user is still responsible 
for ensuring data are combined in a manner consistent with 
their real-world meaning. There is still nothing that 
prevents the weight of an apple from being added to the 
diameter of an orange. In small databases with relatively few 


:abies, the user might be able to track the meaning oi 


data himself. However, this is unlikely in larger systems 


with multiple users. 


E. METHODOLOGY 

This thesis focuses on design and implementation of an SQL 
shell to enforce dimensional and unit consistency of database 
curries. The research includes the following steps: 
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1. Review of the semantics of numeric data and the 
application of concept hierarchies in model management as 
presented by Bradley and Clemence (Bradley and Clemence, 
1988). 

2. Application of the semantics of quantity and unit of 
measure to numeric data in the relational data model. 

3. Design of data dictionary tables needed to incorporate 
semantic information. 

4. Design and implementation of an SQL preprocessor which 
uses the semantic information to verify dimensional 
consistency of database queries. 


C. THESIS STRUCTURE 


Our resear 

ch is presented in 

six chapters. 

Chapter 

II 

discusses the 

semantics of nume 

ric data and 

presents 

an 

abstract data 

type consisting of 

a value descr 

ipuion and 

a 


semantic description. Chapter III demonstrates the benefits 
of including the semantics in the relational data model. 
Chapter IV describes how to represent semantic information in 
the data dictionary. Chapter V presents a simple and 
efficient method of ensuring the dimensional consistency of 
database queries. Chapter VI shows how the semantics can be 
used to build concept hierarchies. These hierarchies are then 
used to enforce semantic integrity of database queries. 
Chapter VII presents our conclusions and recommendations for 
further research. 




II. THE SEMANTICS OF NUMERIC DATA 


Numeric data typically are associated with quantifiable 
characteristics or phenomena in the real world. The computer, 
however, manipulates only the numbers without regard to their 
meaning. The user has the responsibility of ensuring that 
numbers are manipulated in a way consistent with their 
meaning. In this chapter we discuss the semantics of numeric 
data and propose an extended data type for use in a more 
rigorous typing system. Data types and type checking will be 
discussed first in general and then in the context of units of 
measure and dimensional analysis. Finally, the semantics of 
quantity and concept are added and an abstract numeric data 
type is proposed. 


A. DATA TYPING AND TYPE CHECKING 

Computers store and manipulate data and programming 
constructs as combinations of zeros and ones. At the bit 
level, there is no distinction between a character and a 
number. 3oth are represented as binary numbers. Data typing 
provides the distinction and establishes how the contents of 
memory are to be interpreted by the computer. 







A data type specifies the domain of the data value. Most 
programming languages support a few well-known base data 
types: character, real, integer and boolean. More complex 
types, such as arrays, records and sets, may be constructed 
from the base types. The type also identifies the operations 
which may be performed over its domain. The type domains, the 
operations and any rules for conversion or coercion between 
data hypes represent a type system. Type checking uses the 
semantics of the type system to ensure correct interpretation 
and meaningful manipulation of the data. (Aho, Sethi, and 
UlIman, 1SS6, p.343-347) 


S. NUMERIC DATA TYPES AND UNITS OF MEASURE 

Type checking ensures that expressions are consistent with 
the typing system. For example, only numeric data values 


(r eu. 


,— .* i. 


r.ceger ) may os operarec. on oy me ari«.nmecxc 


operators oj_ addition, subtraction, rnuit ipiicanon 


division. This increases the security of the programming 


language by preventing meaningless or incorrect operations 
from being performed (Clemence, 1987, p. 5). 


Current typing 

systems 

numeric data types. 

Only rea 

supported and most 

systems ; 

coercion between 

the two 

themselves have nc 

\ meaning. 

represent the guar.t 

im *. aCCV V* or* 


provide only a limited set of 
I and integer types are typically 
provide rules for conversion or 
types. However, numbers by 
In the real world, numbers 
of some event or characteristic. 


e 



To have meaning, they must be associated with a unit of 
measure (Ciemence, 1987, p. 8). The unit of measure provides 
a standard by which numbers may be compared and gives semantic 
value to the number. 

Including a unit of measure with a numeric value in a data 
type increases the reliability and readability of mathematical 
calculations (Karr and Loveman, 1978, p. 386), Such a data 
type would also increase security of the language itself 
(Ciemence, 1987). Gehani (1977), House (1983), Karr and 
Loveman (1978) have all proposed programming languages with 
units of measure. An extension of the PASCAL language to 
include units has been implemented by Dreiheller, 
Moerschbacker, and Mohr (1986). Ciemence and Bradley contend 
that units alone do not increase security and recommend 
including the semantics of guantity and concept (Bradley and 
Ciemence, 1987). This idea is also applicable to numeric 
decs in relationcarabase systems. 
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C. DIMENSIONAL ANALYSIS 


1. Units of Measure 

Each numeric data value represents the measure of some 
dimension in a particular unit of measure. While every unit 
of measure is associated with a specific dimension, each 
dimension may have several units of measure depending on the 
system of measurement being used. Each system recognizes 
certain fundamental dimensions with a base unit of measure and 
possibly several other units. Units can be converted within 
and across systems of measure according to specific laws of 
conversion. For example, in the standard metric system there 
are seven fundamental dimensions (Beyer, 1987): 


Dimension 

Base Unit 

Other Units 

Length 

Meter 

Kilometer, Centimeter 

Mass 

Ki1ogram 

Grams, Milligrams 

Time 

Second 

Hour, Minute 

Electric Current 

Ampere 


Temperature 

Deg(Kelvin) 

Deg(Celsius) 

Luminous Intensity 

Candela 


Amount of Substance 

Mole 



To this we add the fundamental dimension of currency measured 
in dollars. There are also dimensionless numbers, such as 
ratios, which have no units and may be considered measures of 
a null dimension. This gives nine fundamental dimensions-- 
length, mass, time, electric current, temperature, luminous 
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intensity, amount of substance, currency and the null 
dimension. There are also derived dimensions and associated 
units of measure. A derived dimension is obtained as a 
product of two or more fundamental dimensions or their 
inverses (Bhargava, 1990, p. 5). For example, volume is 
derived by multiplying length times length times length. 

Units of measure are also fundamental or derived 
according to which dimension they measure. Thus, given a 
value's unit of measure we can easily determine the dimension. 
Each unit of measure is associated with only one dimension. 
For this reason, the distinction between the units of measure 
and the dimension being measured is often ignored. However, 
since each dimension may have many associated units, we retain 
the distinction to allow a check of dimensional consistency 
regardless of the units being used. 

2. Dimensional Consistency 

The addition of dimension and unit of measure to 
numeric data types provides a more rigorous and secure typing 
system. Type checking becomes a problem in dimensional 
analysis. Not only the consistency of types is checked but 
consistency of dimensions is verified as well. The laws of 
dimensional consistency are shown below (House, 1983, p. 366): 
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1. Values with similar dimensions may be added or 
subtracted, yielding a value possessing the same 
dimension. 

2. Values with dissimilar or similar dimensions may be 
multiplied. The dimension of the result is the product 
of the dimensions. 

3. Values with dissimilar or similar dimensions may be 
divided. The resulting dimension is the ratio of the 
original dimensions. 

4. No other operations are allowed. 

Practical application of Rule 1. requires that the 
units of measure be the same. If the values have the same 
dimension, the units of measure will differ only by a scalar 
conversion factor. Units that can be converted to one another 
are said to commensurate. The conversion of commensurate 
units must take place before the addition or subtraction of 
ins values. 

Checking an expression for dimensional consistency is 
essentially a problem of symbolic arithmetic. The arithmetic 
operations are performed on the non-numeric symbols of 
dimension and units of measure. For example, the expression 
a + (b*c) might become length + (time * speed) or length + 
(time * [ length/time]). According to Rule 2, dissimilar 

dimensions (time and length/time) may be multiplied with the 
resulting dimension equal to the product of the dimensions. 
Symbolically this multiplication results in a dimension of 
length. Rule 1 allows a length to be added to a length. The 
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expression is found to be dimensionally consistent and gives 


a result that the expression is of dimension length. If the 
expression is on the right side of an assignment, X := a + 
(b*c), the object represented by X must be a number of 
dimension length to maintain consistency. 

D. TKE SEMANTICS OP QUANTITY AND CONCEPT 

Now that we have seen how dimensions and units are 
manipulated we can extend the numeric data type to include 
additional information. 


Scientific 


Unit of measurement [dimension] alone is insufficient 
convey information accurately to someone else, 
observation requires two kinds of description: a 

quantitative description so that the observed phenomenon 
can be distinguished from other phenomena; and a unit of 


measurement 

occurrences 

11 ) 


to distinguish quantitatively similar 
f different magnitude. (Ciemence, 1987, p. 


While a dimension is an abstract concept used to evaluate 
expressions and units of measure, a quantity is the actual 
attribute being measured. Numerical values of the same 
magnitude are not equal unless they describe the same quantity 
and measure the same dimension (Ciemence, 1987, p. 14). The 
quantities LENGTH, WIDTH and ALTITUDE all are represented by 
the fundamental dimension of length but clearly represent 
different measurable phenomena. 






Like dimensions, quantities may be either fundamental or 
derived. A fundamental quantity, can not be expressed as a 
combination of quantities. A derived quantity is obtained by 
combining fundamental quantities. For example, the quantity 
AREA can be derived from the quantities LENGTH and WIDTH. The 
rules for combining quantities and checking consistency are 
the same as for dimensions: 


1. Similar quantities may be added or subtracted, yeilding 
a value possessing the same quantity. Dissimilar 
quantities may not be added or subtracted. 


2 . 


Dissimilar or similar quantities may 
quantity cf. hte result is the product 


multiplied. The 
of the quantities. 


3. Dissimilar or similar quantities may be divided. The 
resulting quantity is the ratio of the original 
quantities. 


4. No other operations are allowed. 


These rules are applied in addition to the rules for 
dimensional arithmetic and consistency. Failure to meet any 
rule causes an inconsistency. Thus, an expression specifying 
the addition of LENGTH (measured in feet) to a WIDTH (measured 
in meters) would be consistent under the rules of dimensional 
arithmetic. (Feet and meters both measure the fundamental 
dimension of length and are therefore commensurate.) However, 
since the quantities are not equal, the expression is 
inconsistent under the rules of quantity arithmetic. "The 
operations of addition and subtraction, and the use of 
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relational operators ( =, <>, <=, >= ), are meaningful only 
when their operands can be reduced to a common quantity and 
dimension (Clemence, 1987, p. 14)." 

Quantity and dimension ensure that values are associated 
with a measurable attribute and have a scale of reference in 
a measurement system. Quantities in turn are attributable to 
an object, or the occurrence of some event, in the real world. 
This object/evenl is called a concept (Bradley and Clemence, 
1987, p. 4). Each concept has an associated set of quantities 
representing measurable attributes of the object. For 
example, the concept Crate has the attributes LENGTH, WIDTH, 
HEIGHT, and WEIGHT. Unlike quantities and dimensions, there 
are no derived concepts. Concepts cannot be multiplied or 
divided. 


E. AS EXTENDED NUMERIC DATA TYPE 

Armed with the semantics of concept, quantity and 
dimension, we can now define an extended numeric data type. 
The type consists of two components, a value description ana 
a semantic description. The vaane description consists of the 
number and its unit of measure. Only v-i.e unit o£ measure may 
be associated with each number. The dimension of the value is 


determined by the unit of measure. The semantic description 








consists of a quantity and>a concept possessing that quantity. 
Our extended data type is shown in Figure 2-1. 


} value description 


} semantic description 

Figure 2-1. Numeric Data Type With Semantics 

This abstract nume ic data type can now be applied to 
the relational data model. In the next chapter, we present 
the relation as an abstract data type and demonstrate how the 
semantics of numeric valued data can benefit relational 
operations. 
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III. NUMERIC DATA IN RELATIONAL DATABASES 

The semantics of numeric data are easily represented in 
the relational database environment. In this chapter we 
discuss the relational database as a collection of two 
dimensional tables wrure rows represent unique entities and 
columns represent attributes. Next we sxscuss three 
fundamental operations with relations: projection, selection 
and joining. Finally, we apply the semantics of concept, 
quantity and unit of measure to the relational model and 
demonstrate their use in relational operations. 

A. RELATIONAL DATA DEFINITION 

A relational databas is a body of information stored in 
two-dimensional tables, or relations. Each relation ccr;*i,sts 
of one cr more rows, called tuples. Each tuple represents an 
instance of an object, cr entity, about which data is 
collected. A relation has the properties shown below (Kroenke 
and L ol an, 1988, p 297): 
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1. The entries in the relation are single-valued. Repeating 
groups and arrays are not allowed. 

2. The entries in any column are ail of the same data type, 
or domain. Each column has a unique name and the order 
is immaterial. Columns of a relation are called 
attributes. 

3. No two rows, or tuples, ; n the relation are identical and 
the order of rows is insignificant. 

It may be helpful to think of a relation as a complex data 
type composed of attributes defined over a set of base data 
types. Commercially available relational database systems 
support only a limited ;t o e base data types. For example, 
ORACLE provides a NUMBER type, a CHAR(i) VAR type for 
character strings, a DATE and a TIME type. The NUMBER type 
provides facilities for handling real and integer numeric data 
values. A set of functions and operations is specified for 
each domain type. More advanced systems allow abstract data 
types tc be defined for domains and allow operations to be 
defined on them (Osborn and Heaven, 1986, p. 359). These 
systems are relatively complex and normally are designed to 
support specialized applications such as computer-aided 
design. Object-oriented DBMS also provide more robust types 
although such systems are still largely experimental and 
inefficient. 

The description of the structure of a relation and the 
specification of its attribute domains makes up a 


relational 








schema. The schema identifies the name of the relation, the 
names and domains of its attributes, and the generalized 
format or structure of the relation. It is in the schema that 
the semantics of concept, quantity and units of measure will 
be specified. 

Relational database terminology is summarized in Figure 

3-1. 


Term 

Meaning 

Relation 

Two-dimensional table. 

Attribute 

Column in a relation. 

Domain 

Set of values an attribute 
can have. 

rrt -. 1 4 _ 

| Jk Up 4. S 

Row in a relation. 

j Relational Schema 

1_ 

The structure and domain 
constraints of a relation. 


Figure 3-1. Relational Database Terminology 


B. RELATIONAL DATA MANIPULATION 

The power of the relational data model lies in its 
flexibility and ease of use. Storing data in tables is 
intuitive and easily understood by most users. Relationships 
between tables, represented by common data values, are 
recognizable and readily established. Finally, because the 
structure is so simple and elegant only a small set of 
operations is needed to process it. There are three 
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fundamental operations used to process relations (Kroenke and 
Dolan, 1988, p. 313-317): 


1. Projection selects specified attributes from a relation 
to form a new relation. 

2. Selection creates a new relation by selecting rows that 
satisfy certain conditions. 

3. Join creates a new relation from the rows in two or more 
relations that have attributes satisfying one or more 
conditions. Join is a Cartesian product of relations 
coupled with projection and/or selection. 


Operations on relations are specified using a data 
manipulation i-nguage. Most commercially available relational 
database systems support ANSI SQL (Structured Query Language) 
or an extended version of the standard. The nonprocedural 
nature of the language allows easy access to the data without 
requiring knowledge of the mathematical foundations of the 
relational model. In fact, a single SQL construct performs 
ail three of the fundamental relational operations. The basic 
syntax of the SELECT statement is: 


SELECT [ALL | DISTINCT] select-list 

FROM {{table-name view-name}[correlation-name]} . 
[WHERE search-condition]; 


Additional clauses may specify ordering or grouping, but the 
syntax shown is sufficient to illustrate the three relational 
operations. 
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1. Projection 

The projection operation selects specific attributes, 
or columns, from a relation. The result is a logical relation 
containing the selected attributes. In the SELECT statement, 
the attributes named in the select-list identify the columns 
to be projected from one relation to the relation being 
formed. The FROM clause specifies the relation being used. 
For example, consider a relation containing distances between 
U.S. cities (Figure 3-2). 


Origin 

Destination 

Mileage 

New York 

Seattle 

2408 

New York 

Chicago 

713 

Chicago 

Los Angeles 

1745 

Los Angeles 

Denver 

831 

Los Angeles 

New York 

2451 

New York 

Atlanta 

748 


Figure 3-2. US_ROUTES Relation 

The projection of the relation US_ROUTES on the 
attribute ORIGIN is specified by the statement: 

SELECT origin 
FROM us_routes; 
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The result of the projection is shown in Figure 3-3. Note 
that the original relation US_ROUTES contains six tuples, or 
rows. The result of the projection contains only three. 
Since the result of a projection is a relation, the third 
property of relations prevents duplicate rows from appearing 
in a projection. Three tuples were eliminated after the 
projection was done. 


Origin 


New York 


Chicago 


Los Angeles 


Figure 3-3. Projection cf US_ROUTES on ORIGIN 


Projection identifies the attributes to be contained 
in the new relation. Selection identifies the tuples to be 
included. Rows are specified by the search-condition in the 
WHERE clause of the SELECT statement. The search-condition 
describes a simple or compound predicate that evaluates as 
true, false or unknown about a given row (Viescas, 1989, p. 
63). If the condition is true when applied to a row, the row 


2C 





is included in the resulting, relation. 


For example, the 


statement 

SELECT origin, destination, mileage 
FROM us_routes 
WHERE mileage > 1C0C; 

results in the relation shown in Figure 3-4. 


Origin 

Destination 

Mileage 

New York 

Seattle 

2408 

Chicago 

Los Angeles 

1745 

Los Angeles 

New York 

2451 


Figure 3-4. Selection of US_ROU?ES where MILEAGE > 1000 


3. Join 

The relational data model derives much of its power 
from the ability to combine information from, or join, two or 
more relations. A join is specified by including more than 
one relation in the FROM clause of the SELECT statement. The 
join operation requires three logical steps. First, the 
Cartesian product of the specified relations is determined. 
This results in a logical relation which pairs all the tuples 
of the first relation with all the tuples of the second 
relation paired with all the tuples of the third relation and 
so on. A selection operation is then performed to eliminate 
some of the tuples. (In practice, the selection takes place 
before the Cartesian product is formed. This eliminates many 
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rows from the product and saves time.) Finally, a projection 
may take place to remo ; certain attributes. 

To illustrate, consider a join between the US_ROUTES 
relation and a relation containing data on international 
routes (Figure 3-5). A SELECT statement with the two tables 
listed in the FROM clause and with no selection criteria 
specified in a WHERE clause will return the complete Cartesian 
product. This product will have six attributes (Origin, 
Destination, Mileage, Source, Terminal, Distance) and will 
contain 54 tuples. 


Source 

Terminal 

Distance 

New York 

London 

5589 

Seattle 

Tokyo 

7714 

London 

Berlin 

933 

London 

Rome 

1436 

London 

Paris 

346 

London 

Moscow 

2504 

Tokyo 

Singapore 

5317 

Tokyo 

Peking 

2100 

New York 

Rio De Janeiro 

7733 


Figure 3-5. INTERNATL_ROUTES Relation 

To obtain a more meaningful result, we need to include 
join criteria in the WHERE clause of the SELECT statement. 
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This eliminates rows from the Cartesian product that contain 
superfluous information. For example, including 

WHERE US_RO T JTES.DESTINATION = INTERNATL_ROUTES. SOURCE 
in the SELECT statement results in the relation shown in 
Figure 3-6. 


Origin 

Destination 

Mileage 

Source 

Terminal 

Distance 

New York 

Seattle 

2408 

Seattle 

Tokyo 

7714 

LA 

New York 

2451 

New York 

London 

5589 

LA 

New York 

2451 

New York 

Rio 

7733 


Figure 3-6. A Join Between US_ROUTES and INTERNATL_ROUTES 


Because this join includes only rows that have 
matching attribute values from both tables, it is called an 
Inner Join . If we do not provide a select-list, or specify 
that all columns be selected, an inner join returns the 
matched data values twice (once in the DESTINATION column and 
one in the SOURCE column). We can eliminate this duplication 
by select only one of the columns identified in the join 
criteria: 

SELECT origin, destination, distance, terminal, mileage 
FROM us_routes, internatl_routes 

WHERE us_routes.destination = 
internatl_routes.source; 






























This results in a relation that does not contain the 
duplicate attribute values (Figure 3-7). This is a Natural 
Inner Join. 


Origin 

Destination 


Terminal 

— 

Distance 

New York 

Seattle 

2408 


HQ| 

LA 

New York 

| . 

London 

5589 

LA 

New York 

2451 

Rio 

7733 


Figure 3-7. Natural Inner Join 


Relations also can be joined using more complex criteria. 
The search_condition of the WHERE clause may include multiple 
predicates of varying degrees of complexity. Our focus is on 
predicates involving numeric valued data attributes. These 
may be as simple as a relational comparison between two 
attributes or between an attribute and a numeric constant. 
Predicates also may include more complex mathematical 
expressions combining many attributes, constants and 
relational operators. 

For example, we formed a natural inner join of the 
relations US_ROUTES (Figure 3-2) and INTERNATL_ROUTES (Figure 
3-5). The resulting relation (Figure 3-7) contained 
information about U.S. shipping routes with connections to 
international cities. We cun go from New York to Seattle and 
then on to Tokyo. If we start in Los Angeles we can reach 
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either London or Rio De Janeiro via New York. Now suppose we 
want to limit our trip to less than 9,000 miles. The SELECT 
statement 

SELECT origin, destination, terminal 
FROM us_routes, internatl_routes 

WHERE us_routes.destination = internatl.source AND 
us_routes.mileage + internatl.distance < 9000; 

returns the relation shown in Figure 3-8. Only one of the 
three tuples in the original join meets the second part of our 
join criteria. The mileage from Los Angeles to New York is 
2451 and the distance from New York to London is 5589, giving 
a total trip of only 8040 miles. (Los Angeles to Rio: 2451 + 
7733 = 10184; New York to Tokyo: 2408 + 7714 = 10122) 


Origin 

Destination 

Terminal 

Los Angeles 

New York 

London 


Figure 3.8. Join Using a Compound Search Condition 

This result depends on several assumptions.. First, we 
assume that US_ROUTES and INTERNATL_ROUTES represent similar 
entities or concepts. Next, we assume that the attributes 
MILEAGE and DISTANCE both represent identical quantities. 
Finally, we treat the values contained in the MILEAGE and 
DISTANCE columns as if they were derived from the same unit of 
measure, miles. 
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What if the values in one table are measured in miles, 
while kilometers are used in the other? The result of our 
query then has no meaning, or at least may not answer the 
question being asked. The computer manipulates only the 
numbers not the semantics. Dimensional consistency of the 
query is left to the user. This responsibility of checking 
semantic consistency can be shifted to the database system, 
however, if the semantics are stored along with the data 
values. 

C. ADDING CONCEPT, QUANTITY AND UNITS TO THE RELATIONAL MODEL 
Application of our extended numeric data type to the 
relational model is fairly straightforward. A concept 
represents a type of object or event in the real world which 
has one or more measurable attributes or quantities . Thus, a 
concept specifies an entity type. A relation is simply an 
aggregation of specific instances of entities (tuples). Since 
each tuple represents an instance of the same entity type, one 
concept is sufficient for each relation. A concept may be 
associated with more than one relation but a relation 
represents only one concept. For example, two relations, one 
containing information about automobiles and the other data 
about trucks, may both represent the single concept Vehicle. 
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As defined earlier, a quantity is a measurable attribute 
or characteristic of an object. This relates directly to the 
column in the relational model. Every column in a relation 
represents a unique characteristic or attribute. To allow for 
meaningful use of data, each column defined over a numeric 
domain should have an associated quantity and a unit of 
measure. The second property of relations requires that all 
data values in a column be of the same data type. Therefore, 
only one quantity and one unit of measure apply to all data in 
the column. The same quantity, however, may be associated 
with multiple attributes both within a single relation and 
across other relations. 

Returning to our example, suppose the semantics of 
concept, quantity and unit of measure are added to the 
relations, US_ROUTES and INTERNATL_ROUTES. The schema, or 
structure, of the relations are shown below: 

US_ROUTES (Concept = AIR_R0UT2) 

Origin Character String 

Destination Character String 

Mileage Number (Quantity = Distance, 

Units = 2-iiles) 

INTERNATL_RO r JTES (Concept = AIR_ROCTE) 

Source Character String 

Terminal Character String 

Distance Number (Quantity = Distance, 

Units = Kilometers) 
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We now can check our earlier query 

SELECT origin, destination, terminal 

FROM us_routes, internatl_routes 

WHERE us_routes.destination= internatl_routes.source AND 
us_routes.mileage+ internatl_routes.distance < 900C 

for semantic and dimensional consistency. We focus on join 

criteria specified in the WHERE clause, particularly the 

portion dealing with numeric data attributes. The first 

portion of the predicate specifies a relational comparison of 

non-numeric attributes. This is outside the scope of our 

numerical dimensional check. 

The second predicate is a mathematical expression 
involving two numeric attributes (us_routes.mileage, 
internatl_routes.distance) and one numeric constant (9000). 
For the time being, we will assume the data type of the 
constant specifies Air_Route as its concept, Distance as the 
quantity and Miles as the unit of measure. Both operands on 
the left side of the expression are associated with the same 
concept, Air_Route, and quantity. Distance. The units of 
measure are different. Both Miles and Kilometers measure the 
fundamental dimension of length, however, and are therefore 
commensurate. Thus only a conversion factor (0.62137 
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Miles/Kilometer) need be applied to the expression to make it 
dimensionally consistent. The corrected query 

SELECT origin, destination, terminal 

FROM us_routes, internatl_routes 

WHERE us_routes.destination = internatl_routes.source AND 
us_routes.mileage + 

0.62137*internatl_routes.distance < 9000; 
returns the relation in Figure 3-9. Note that all three of 
the trips contained in the original join (Figure 3-7) are 
included in the new relation. All three have a total distance 
under 9000 miles, a fact we could not determine without 
performing a a.imensional check on the query and applying the 
necessary conversion factor. 


Origin 

Destination 

Terminal 

New York 

Seattle 

Tokyo 

Los Angeles 

New York 

London 

Los Angeles 

New York 

Rio De Janeiro 


Figure 3-9. Join After Check for Semantic Consistency 


In the next two chapters, we discuss the implementation of 
a type system shell for SQL which allows the database system 
to perform the dimensional analysis and apply the conversion 
automatically. 
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IV. REPRESENTIN'! THE SEMANTICS IN THE DATA DICTIONARY 


Implementation of an SQL type checking shell involves two 
primary functions. First, the semantics of concept, quantity 
and unit of measure must be incorporated in the relational 
database. This is accomplished in the design of the data 
dictionary, which contains the relational schema for the 
database. Also included in the data dictionary are the scalar 
factors needed for conversion of commensurate units of 
measure. 

In this chapter we discuss incorporation of the semantics 
into the data dictionary. First, we propose a prime number 
encoding scheme that allows numeric representation of the 
semantics. This transforms the problem of symbolic semantic 
manipulation to a much simpler numeric problem. Finally, we 
discuss the data dictionary tables needed to represent the 
semantics of concept, quantity and unit of measure. 

A. REPRESENTATION OF SEMANTICS 

Semantic manipulation to support type checking involves 
three primary operations (Bhargava, 1990, p. 5): 



1. Verification of equivalence. For example, kg*m/s is 
equivalent to (l/s)*m*kg. 

2. Simplification. For example, (m/s 2 )*s + m/s simplifies 
to m/s. 

3. Transformation of Commensurate Units. 

The first two operations are related and are the most 
complicated to automate symbolically. Bhargava proposes a 
prime encoding scheme which transforms the problem from one of 
symbolic arithmetic to a much simpler numeric problem. 

1. Verification of Equivalence 

The semantics of quantity and dimension are either 
fundamental or derived. Verification of the equivalence of 
fundamental semantics is easily implemented symbolically. 
There are no arithmetic operations required. If the semantics 
are represented by identical character strings, they are 
equivalent. If the string representations are different, the 
semantics are not equal. 

Derived semantics pose a more difficult problem of 
verification. Equivalent semantics can be derived by 
combining fundamental semantics in different ways. For 
example, the dimension (1/time)*length is equivalent to 
length/time. A simple comparison of the symbolic 

representation of the two dimensions does not readily reveal 
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this equivalence. In fact, more complex derived semantics can 
easily become a significant problem of symbolic mathematics. 

While there are programs capable of evaluating 
symbolic expressions, the problem can be transformed into one 
of simple numeric arithmetic by representing fundamental 
dimensions with prime numbers. The product (or quotient) of 
prime numbers is always unique. No matter how many ways 
fundamental semantics are combined, two combinations resulting 
in the same numeric value are equivalent. Semantic equality 
is equivalent to numeric equality. (Bhargava, 1990) 

The nine fundamental dimensions and their prime number 
representations are shown in Figure 4-1. Fundamental 
quantities can be represented in the same manner. Notice that 
the null dimension (for use with unitiess numbers) is 
represented by the number one (1), an identity for 
mu! tiplicatior.. 

The numeric representation of derived semantics is 
obtained by arithmetic combination of the prime number 
representations of the fundamental semantics. For example, 
the dimension volume is derived from the fundamental dimension 
of length according to the formula length*length*length or 
(length)'. Substituting the prime number representation for 
length into either of the formulas and carrying out the 
arithmetic operations gives us the numeric representation for 
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volume: 3*3*3 = (3)" - 27. This representation can then be 

used to derive the representation of the dimension density: 
density = mass / volume C=(5)/(27) = 0.185185185]. 


Fundamental 

Dimension 

Prime Number 
Representation 

null dimension 

1 

currency 

2 

1ength 

3 

mass 

5 

time 

7 

electric current 

11 

temperature 

13 

luminous intensity 

17 

amount of substance 

19 


Figure 4-1. Representation of Fundamental Dimensions 


Dimensions derived through division can result in 
floating point numeric representations. Such representations 
complicate verification of equivalence. The limitation on the 
number of significant digits which can be represented on a 
given computer gives rise to precision errors in floating 
point arithmetic. The order in which values are multiplied or 
divided can affect the final internal representation of the 
result. Thus, equivalent semantics derived via a different 
order of multiplication and division may not have identical 
numeric representations. 

We can avoid floating point precision errors by 
eliminating the division of the semantic representations. 
Each quantity and dimension can be represented by two integer 
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values, one for the prime product in the numerator (d- ; ) and 
the other for the denominator (dj). Equivalence of two 
dimensions can be verified by cross multiplying their d;'s and 
dj's and checking for numeric equality. Figure 4-2 shows a 
revised encoding of the fundamental dimensions and the derived 
dimensions of volume and density. 


Fundamental 

Prime 

Number 

Dimension 

Representation 


dl 

d2 

null dimension 

1 

1 

currency 

2 

1 

X 

length 

3 

1 

mass 

5 

1 

time 

7 

1 

electric current 

11 

1 

temperature 

13 

1 

lumi. ous intensity 

17 

1 

amount of substance 

19 

1 

volume 

9 

I 

density 

5 

9 


Figure 4-2. Two-part Representation of Dimensions 
2. Simplification 

Simplification of semantic expressions is now 
straightforward. The laws of consistency are readily 
implemented since semantic equality is easily tested. 
Multiplication of semantic terms requires multiplication of 
the d^*s and dj's. Division requires cross multiplication of 
dj's and dj’s. For addition and subtraction, check semantic 
equivalence. (Bhargava, 1990, p. 10-11) 
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B. DATA DICTIONARY REPRESENTATION 


Information on the design and structure of a database is 
normally stored in a central repository known as the data 
dictionary. Because the information contained in the 
dictionary describes other data, it is often termed metadata. 
In relational systems, metadata typically is stored in 
relations, or dictionary tables. For example, ORACLE uses 39 
dictionary tables to store metadata in its data dictionary. 
Information contained in the tables includes system 
administration data as well as data about user-defined 
relations and their attributes. 

Data dictionaries are classified as either passive or 

active in nature (Dolk and Kirsch, 1987, p. 49). A passive 

dictionary is similar to a language dictionary found in the 

library. A passive dictionary merely documents metadata. The 

database system does not rely on the dictionary to control and 
/ 

process data. In fact, the system may obtain its metadata 
from other sources. On the other hand, an active data 
dictionary is the sole source of metadata for the DBMS. The 
dictionary must be accessed for each process or transaction 
allowing for implementation of much more powerful control 
mechanisms. There is a performance penalty, however, since 
the system must access the dictionary for every transaction. 
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Implementation of a type analysis shell requires an active 
data dictionary. In addition to the standard dictionary 
tables, three tables are needed to contain the semantic 
information associated with numeric valued data. One table 
holds the data needed for automatic conversion of commensurate 
units. The second table associates a concept with each 
relation. Finally, a third dictionary table contains the 
quantity and unit of measure associated with each numeric 
valued attribute or column. 

1. The UNITS Dictionary Table 

The UNITS dictionary table is used to identify 
commensurate units and the conversion factors needed to make 
units equivalent. Recall that each unit of measure is 
associated with a dimension. Commensurate units are 
associated with the same dimension and differ only by a scalar 
conversion factor. Thus, our UNITS dictionary table must 
contain the unit of measure, its associated dimension and the 
conversion factors. A relational schema for the table is 
shown in Figure 4-3. 


Unit 

| Dimension 

dl 

d2 

Conversion Factor 


Figure 4-3. UNITS Dictionary Table 

The unit and dimension attributes are stored as 
character strings. The d- and d; columns contain the numeric 
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representation of the dimension. Commensurate units are 
identified by cross multiplying the d-'s and d-'s and checking 
for numeric equality. The conversion factors are stored as 
character strings to allow insertion into the SQL statement 
during preprocessing. 

Notice there is only one conversion factor per unit of 
measure. The system recognizes a specific base unit for each 
of the nine fundamental dimensions (see Figure 4-4). All 
other units are defined in terms of the base units. 


Dimension 

d-1 

d2 '' 

Base Unit 

(nul1) 

1 

1 

(unitless) 

currency 

2 

1 

Dollar(US) 

length 

3 

1 

Meter 

mass 

5 

1 

Kilogram 

time 

7 

1 

Second 

electric current 

11 

1 

Ampere 

temperature 

13 

1 

Degrees(Kelvin) 

luminous intensity 

17 

1 

Candela 

amount of substance 

19 

1 

Mol e 


Figure 4-4. Base Units of Measure 


The factor stored in the dictionary allows conversion of units 
to the base. Thus, commensurate units are converted into 
identical base units for use in SQL expressions. 
Extensibility is supported by allowing the user to define 
units using the base units or any unit of measure already 
defined. 
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2. The CONCEPTS Dictionary Table 

The CONCEPTS dictionary table associates a semantic 
concept with each relation. Concepts cannot be derived 
through multiplication or division, therefore a numeric 
representation is not necessary. The relational schema for 
the CONCEPTS dictionary table is shown in Figure 4-5. Both 
attributes are stored as character strings. 


Table-Name 


Concept 


Figure 4-5. CONCEPTS Dictionary Table 
3. The QUANTITIES Dictionary Table 

The QUANTITIES dictionary table contains the quantity 
and unit of measure associated with each numeric valued 
attribute. Like dimensions, quantities can be either 
fundamental or derived from other quantities. The same 
numeric scheme is used to represent quantities in the data 
dictionary. The relational schema is shown in Figure 4-6. 


Table-name 

j Column-name 

| Quantity 

qi 

q2 

Units 


Figure 4-6. QUANTITIES Dictionary Table 

Once the semantics have been added to the database, we 
must provide a mechanism for their use in the type checking of 
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expressions. In the next chapter we discuss implementation 
of a preprocessor which evaluates SQL statements for semantic 
consistency before they reach the DBMS for processing. 
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V. EVALUATING SQL EXPRESSIONS 


Storing semantic information in a data dictionary allows 
us to check automatically for semantic consistency of SQL 
SELECT statements. This is easily accomplished by an SQL 
preprocessor before the statement is passed to the DBMS for 
routine processing. The preprocessor is concerned only with 
SELECT statements. All other SQL statements are passed 
directly to the DBMS for routine processing. The preprocessor 
must perform three primary operations: 

1. Retrieve the required semantic information from the data 
dictionary tables. 

2. Evaluate the search conditions in the WHERE and HAVING 
clauses for semantic consistency. 

3. Convert commensurate units into a common unit of measure. 

In this chapter, we discuss implementation of the SQL 
preprocessor. The discussion focuses on verification of 
quantity consistency and conversion of commensurate units. 
Evaluating concept consistency is more difficult and is 
discussed in the next chapter. 
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A. ACCESSING THE DATA DICTIONARY 

The data dictionary tables are updated each time a CREATE 
TABLE or CREATE VIEW command is entered by the user. The user 
must define a concept for each table created and a quantity 
and unit of measure for each numeric-valued attribute. 
Quantities may be defined in terms of known quantities or may 
be fundamental. Units of measure msut be defined in terms of 
known units of measure. 

If a table or view is dropped from the database, the 
corresponding rows in the CONCEPTS and QUANTITIES dictionary- 
are deleted. Similarly, the dictionary tables are updated 
each time a database table is altered by adding or deleted an 
attribute. 

The FROM clause of the SELECT statement identifies the 
relation or relations to be queried by the statement. The 
table-names specified in the FROM clause can be used to 
generate three data dictionary queries described below. The 
results of the three data dictionary queries should be stored 
in temporary data structures in computer memory. This 
minimizes the number of database accesses and speeds 
preprocessing. 

The first query retrieves the concepts associated with 
each table from the CONCEPTS dictionary table. Entries are 
made in the CONCEPTS table only if a relation contains numeric 
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valued attributes. Thus, if there are no entries in the 
CONCEPTS table corresponding to the tables named in the FROM 
clause, there will be no numeric valued attributes involved in 
the query and no need for further preprocessing. The SELECT 
statement can be passed directly to the DBMS for normal query 
processing. 

The second query to the data dictionary, retrieves the 
quantities and units of measure from the QUANTITIES table. 
Entries from the QUANTITIES table identify the numeric valued 
attributes in each table and the quantity and units of measure 
associated with each attribute. The results of this query 
subsequently are used to retrieve the appropriate conversion 
factors from the UNITS table. We now have all the information 
needed to evaluate search conditions for semantic consistency. 

B. EVALUATING SEARCH CONDITIONS 

Search conditions are used in the WHERE clause and in the 
HAVING clause of the SELECT statement, as shown below 
(Viescas, 1989, p. 64): 

SELECT [ALL i DISTINCT] select-list 

FROM {{table-name j view-name} [correlation-name]},... 

[WHERE search-condition] 

[GROUP BY {column-name ! column-number},...] 

[HAVING search-condition] 

[{INTERSECT ! MINUS ! UNION [ALL]} select_statement] 

[[ORDER BY {{column-name i column-number}[ASC 

!DESC]},...]! 

[FOR UPDATE OF {column-name},...]] 
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The search condition identifies a simple or compound 
predicate that is true, false, or unknown about a given row. 
The condition defines which rows will appear in the resulting, 
logical table. If the condition is true when applied to the 
row, the row is included in the result table. Application of 
a search condition does not change the value of the attributes 
contained in a row. (Viescas, 1989, p. 63) Syntax for the 
search condition is shown here: 

[NOT] {predicate i (search-condition)} 

[AND ! OR} [NOT] {predicate i (search-condition)}]... 

The standard Eoolean operators--AND, OR, and NOT--combine 
individual predicates into compound search conditions. The 
Boolean operators accept only three values--true, false, and 
unknown--and are not subject to the rules of semantic 
consistency applied to numeric expressions. 

There are two types of predicates which involve numeric 
valued attributes and require semantic evaluation. The first 
compares an expression to a range of values. The second 
compares the values of two expressions using standard 
relational operators. 

The BETWEEN predicate compares a value with a range of 
values. 

expression [NOT] BETWEEN expression AND expression 
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The data types of all expressions must be compatible and each 
expression must be semantically consistent. 

The comparison predicate compares values of two 
expressions using the standard relational operators. The 
semantics of the first expression must be the same as those of 
second expression. 

expression {=!<>!>!<!>=!<=} expression 

To verify the semantic consistency of the BETWEEN or the 
comparison predicate, we must first evaluate their component 
expressions. Numeric-valued expressions obey the following 
syntax: 

[+i-j { (expression) ! literal j column-name} 

[{+!“!*!/} { (expression) j literal ! column-name} ]... 

Expressions are evaluated for correctness according to the 
laws of semantic consistency. This is easily accomplished by 
first converting the expression to postfix form and then 
conducting operations on a simple stack. As each operand is 
identified, its semantics are retrieved from the temporary 
dictionary tables created earlier. The semantics are then 
concatenated to the operand token and pushed onto the stack. 
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1. Quantity Consistency 

As the quantity of each operand is retrieved, its two- 
part prime product representation (q^qj) is pushed onto the 
stack. When an arithmetic operator is read, its operands will 
be the two top elements of the stack. These are popped from 
the stack, the indicated operation is performed and the 
resulting quantity is returned to the stack. 

a. Multiplication and Division 

When a multiplication operator is encountered, the 
top two quantities ( (q^^r and (q^qj)*) are popped from the 
stack. The two ql's are multiplied to give the resulting 
representation (q)'. The product of the Cj's gives the 
resulting (q-)' value. The new (qi^)' is returned to the 
stack. 

Division requires a similar though slightly more 
complicated procedure. The first quantity popped from the 
stack (q-^j)' represents the divisor, the second quantity 
the dividend. The resulting quantity (q-jq-) 1 is 
determined by cross multiplying: 

(<3;)’ = (<I2 ) 2 * (^i) 1 and (*32) * = ( q i) 2 * ( q 2^ 

b. Addition and Subtraction 

Addition and subtraction require that both operands 

9 1 

have the same quantity. If (qi^) and (q^qi)* differ, the 


45 




expression is inconsistent. The quantity resulting from an 
addition or subtraction is the same as that of the operands 

((qwq2) ,= (<3i,q 2 ) 2 = (q^qj) 1 )- 

Unary plus and minus operators involve only one 
operand. However, they do not affect the quantity or unit of 
measure of the operand. Thus, unary operators are ignored by 
the preprocessor. 

c. Relational Operators 

Like addition and subtraction, the relational 
operators ( =, <>, >, <, >=, <= ) require that both operands 
have the same quantity. The result of a relational operation 
is either true or false. It does not have a quantity. 

2. Handling Literals 

Expressions in SQL may contain numeric constants or 
literals. These are known to the system only at the time the 
query is being processed. The data dictionary holds no 
concept, quantity or units description for the literals. 
However, the semantics of these constants must be considered 
when evaluating the expression for consistency. 

There are two ways to deal with literals*. The first 
is to require all literals to be declared as defined constants 
and given a variable name. A quantity and unit of measure is 
assigned by the user when the constant is defined. 
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The second method assumes the user intends the literal 
to have a quantity and unit of measure consistent with the 
context of the expression. In this case, we must introduce a 
universal type, (ul,u2). It is equivalent to all other 
quantities for addition, subtraction and comparison. For 
multiplication and division, it is equivalent to a 
dimensionless quantity where (ql,q-2) = (1,1). (Bradley and 
Clemence, p. 47) 


C. AN EXAMPLE OF PREPROCESSING 

Evaluation of an SQL SELECT statement is demonstrated in 
the following example. Consider the following relations: 


SHIP (concept = SHIP) 
SKIP_NAME 
LENGTH 
BEAM 
DRAFT 

KOLD_LENGTK 
HOLD__WIDTK 
HOLD_KEIGHT 
HATCK_AREA 
CARGO CAPACI 


(Quantity 

(Quantity 

(Quantity 

(Quantity 

(Quantity 

(Quantity 

(Quantity 

Y(Quantity= 


length,Units = Ft) 
width. Units = Ft) 
depth. Units = Ft) 
length. Units = Ft) 
width,Units = Ft) 
height,Units = Ft) 
area,Units = SqFt) 
weight,Units = Tons) 


CRATE (concept = CRATE) 

CRATE_ID 

LENGTH (Quantity = length,Units = Meters) 
WIDTH (Quantity = width,Units = Meters) 

HEIGHT (Quantity = height,Units = Meters) 

WEIGHT (Quantity = weight,Units = Kilograms) 


For a crate to be loaded onto a ship it must fit through 
the ship's hatch. Crates are stacked five high inside the 
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ship's hold. The following SELECT statement will identify 
those crates that can be loaded on a specific snip: 

SELECT CRATE_ID 
FROM CRATE, SHIP 

WHERE SHIPJNAME = "TITANIC" AND 

CRATE.LENGTH*CRATE.WIDTH < HATCK_AREA AND 
CRATE.HEIGHT*5 < HOLD_HEIGHT; 

The FROM clause provides the table names needed to query 

the data dictionary. Three temporary dictionary tables are 

established by the preprocessor: 

CONCEPTS 

Table Name Concept 
CRATE Crate 

SHIP Ship 


QUANTITIES 


Table-name 

Column-name 

Ouantity 

Si 

Si 

Units 

CRATE 

LENGTH 

1ength 

2 

1 

Meters 

CRATE 

WIDTH 

width 

3 

1 

Meters 

CRATE 

HEIGHT 

height 

5 

1 

-L 

Meters 

CRATE 

WEIGHT 

weight 

7 

1 

J. 

Kilograms 

SHIP 

LENGTH 

1ength 

2 

1 

JL 

Ft 

SKIP 

BEAK 

width 

3 

1 

Ft 

SHI? 

DRAFT 

depth 

11 

1 

aL 

Ft 

SKIP 

KOLD_LENGTK 

length 

2 

1 

Ft 

SHIP 

HOLDJHIDTE 

width 

3 

x 

Ft 

SHI? 

HOLD_HEIGKT 

height 

5 

X 

Ft 

SHIP 

HATCK_AREA 

area 

6 

1 

JL 

SqFt 

SKIP 

C ARGO__C AP AC IT Y weight 

7 

1 

Tons 


UNITS 


Units 

Meters 

Kilograms 

Dimension 

1ength 
mass 

d, 

3* 

5 

* 

X 

1 

Conversion Factor 

Ft 

length 

3 

I 

0.3048 

SqFt 

1ength*1ength 

9 

1 

0.09290304 

Ton 

mass 

5 

t 

X 

(.45359/2000) 
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Notice that the derived quantity area is defined in terms of 
the quantities length and width. 

The WHERE clause contains a compound search condi. ^.1 with 
three distinct predicates. The first, SHIPJNAME = "TITANIC" 
involves non-numeric attributes and is ignored by the 
preprocessor. The remaining predicates involve numeric 
expressions and must be evaluated for semantic consistency. 

Evaluation begins by converting the expressions into 
postfix form. 


Expression 


Postfix Representation 


GA;-..:XGTH*CSA;:.«:DiH < HATCH_AREA CFMEAEMlRJMUJIDlRSWlCRJRE^i 


CRATE.K:;SK;*5 < HOLD HEIGHT 


CRAIE.HEIGHI I 5 I *,H0LD_HEI6HT,< 


Figure 5-1 shows the contents of the postfix stack as the 


expressions are evaluated. 


Symbo1 

Quantity 

Oper 1 


Oper2 

Resu 11 

Stack 

CRATE.LENGTH 
CRATE.WIDTH 
* 

(2,1) 

(3,1) 

(2, 

D 

(3,1) 

(6,1) 

(2,1) 

(2.1) (3,1) 

(6.1) 

HATCH_AREA 

< 

(6,1) 

(6, 

D 

(6,1) 

Logical 

(6,1)(6,1) 
Consistent 

CRATE.HEIGHT 
5 
* 

(5,1) 

(ul,u2) 

(5, 

D 

(u1,u2) 

(5,1) 

(5.1) 

(5.1) (u1,u2) 

(5.1) 

HOLD_HE1GHT 

< 

(5,1) 

(5, 

D 

(5,1) 

Logical 

(5,1)(5,1) 
Consistent 


Figure 5-1. Evaluation of Postfix Expressions 


Because the system has no way of converting quantities 
into one another, inconsistent expressions cause the SELECT 
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statement to be returned to the user for correction. Figure 

5-2 shows the progress of the stack for: 

SELECT CRATE-ID 
FROM CRATE, SHIP 
WHERE SHIP-ID = "TITANIC" AND 

CRATE.LENGTH <= CRATE.WE IGHT; 

The expression CRATE.LENGTH <= CRATE.WE IGHT tries to compare 
an attribute representing a length to one representing a 
weight. This is inconsistent with respect to quantity. 


Symbo1 

Quantity 

Oper 1 

Oper2 

Resu 11 

Stack 

CRATE.LENGTH 
CRATE.WE 1GHT 
< = 

(2,1) 

(7,1) 

(2,1) 

(7,1) 

Logical 

(2,1) 

(2,1)(7,1) 
INCONSISTENT 


Figure 5-2. Stack Contents For Inconsistent Expression 


D. CONVERSION OF COMMENSURATE UNITS 

An expression that is consistent in quantity will also be 
dimensionally consistent. All that remains before the 
statement is passed to the DBMS is the conversion of 
commensurate units of measure. To simplify the conversion, 
all commensurate units are converted into the base units of 
measure. For example, both feet and inches will be converted 
into the base unit of measure for length, meters. 

Every numeric-valued operand in an expression will have a 
conversion factor applied before the statement is passed to 
the DBMS. The conversion factor is stored in the UNITS 
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dictionary table as a character string which is inserted prior 
to the operand in the SQL statement. Conversion factors for 
base units are null strings and will not change the original 
SQL statement. In our example, the SELECT statement is 
modified as follows: 

SELECT CRATE_ID 

FROM CRATE, SHIP 

WHERE SHIP_NAME = "TITANIC" AND 

CRATE.LENGTH*CRATE.WIDTH < 0.09290304*HATCH_AREA AND 
CRATE.HE IGHT*5 < 0 . 3048 * HOLD_HEIGHT; 

Literals are assigned the conversion factor of the operand 
preceding them in the postfix expression. The user is 
responsible for ensuring that the value represented by the 
literal is expressed in the appropriate units. 

We have demonstrated a simple scheme for automatic 
verification of quantity consistency and conversion of 
commensurate units in database systems. Responsibility for 
ensuring dimensional consistency can now be shifted from the 
user to the database system. Verifying concept consistency 
poses a more difficult challenge, however. Assigning 
semantics to a relation resulting from a join also poses a 
problem. In the next chapter, we address these problems and 
introduce the concept hierarchy as a possible approach to 
solving them. 
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VI. CONCEPT HIERARCHIES 


Verification of concept consistency in relational 
operations is a difficult problem. An SQL search condition 
will be consistent only if the concepts associated with each 
operand are identical. Since attributes inherit the concept 
associated with the relation, strict consistency would allow 
expressions to contain only attributes drawn from relations 
having the same concept. There is no way to combine concepts 
to create new concepts. Most database queries, however, 
involve more than one relation and most likely more than one 
concept. Search conditions rarely involve attributes from 
just one of the relations being joined. 

If there is a way of relating concepts to one another, it 
becomes possible to coerce concept consistency by substituting 
related concepts when evaluating expressions. In this 
chapter, we introduce concept hierarchies as a way of 
determining the relationships between concepts. We begin with 
a discussion of the binary relationships which link database 
tables. These relationships can be combined into a variety of 
larger structures represented by directed graphs. A concept 
hierarchy is realized when concepts are included in the graph. 
This hierarchy reflects the semantic intent of the database 
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design and can be used to ensure queries are consistent with 
the design. 

A. LINKING RELATIONS 

In the relational data model each relation represents a 
collection of instances of a specific entity type. Specific 
instances (rows) are identified’by a unique value in at least 
one column of the relation. This is known as the primary key 
for the relation. The key is used to identify each row and to 
keep all rows distinct. 

Primary keys from one relation will often appear as 
columns in another relation. This indicates a relationship 
between the two tables. When a primary key is included as an 
attribute of a second relation, it is called a foreign key . 
A value of a foreign key refers to a row in the original 
relation. Referential integrity is achieved when each value 
(i.e., row) of the foreign key specifically refers to one and 
only one row of the parent relation. Foreign keys can be used 
to represent three primary types of relationships between 
tables: one-to-one, one-to-many, and many-to-many. These 
relationships are considered binary because they 1 ink only two 
entity types. Two or more binary relationships can be used to 
create more complicated structures. 
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1. One-To-One Relationships (1:1) 

In a one-to-one (1:1) relationship, an instance of one 
relation is related to no more than one instance of another 
relation. This is represented by the placement of the key of 
either relation in the other relation. This is the simplest 
form of a binary relationship. (Kroenke and Dolan, 1988, p. 
169-174) 

Suppose a shipping crate can contain only one item. 
There is a one-to-one relationship between the crate and its 
item. Figure 6-1 shows the two relations. Primary keys are 
underlined. Foreign keys are marked with an asterisk. Notice 
that the primary key (ITEM-ID) of the ITEM relation is placed 


a foreign 

key in the CRATE relation. 

ITEM 




1 tern-1D 

Description 

Weight 

Cost 


CRATE 


Crate-No 

Length 

Width 

j- Height j 

Weight j 

1 tern-1D* 


Figure 6-1. Representing a One-to-One Relationship 

2. One-To-Many Relationships (1:N) 

In a one-to-many (1:N) relationship, a row of one 
relation is related to potentially many rows of another 
relation. The relation on the one side of a 1:N relationship 
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✓ 


is said to be the parent. The other relation is the child. 
To represent a 1:N relationship, the key of the parent 
relation is placed as a foreign key in the child. (Kroenke 
and Dolan, 1988, p. 174-178) 

A 1 :N relationship is illustrated in Figure 6-2. A 
ship can carry many crates but each crate can be on at most 
one ship. This relationship is represented by placing the key 
of the parent relation ( Ship-Name ) as a foreign key in the 
child re 1 at ion. 


SHIP 


Ship-Name 

Length 

Beam 

Draft 

* ♦ * 


CRATE 


Crate-No j 

Length 


1 tern-1D* 

Ship-Name* 


Figure 6-2. Representing a One-To-Many Relationship 

3. Many-To-Many Relationship (M:N) 

A many-to-many (M:N) relationship cannot be directly 
represented in the relational model. In an M:N relationship, 
a row of one relation corresponds to many rows of another 
relation. A row in the second relation also corresponos to 
many rows of the first relation. To represent this, we need 
a third relation called an intersection relation. An 
intersection relation is composed of the keys of each of the 
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related relations. In this case the intersection relation is 
a child to both relations. (Kroenke and Dolan, 1988, p. 178- 
182) 

A single ship may call at many ports and each port can 
be a destination for many ships. This many-to-many 
relationship is represented by the relations in Figure 6-3. 
The keys of the SHIP relation ( Ship-Name) and of the PORT 
relation ( Port-Name) are combined to form the SHIP-PORT 
relation. The key of this intersection relation is the 
combination of the keys of its parents ( Ship-Name*+Port- 
Name* ). 


SHIP 


Ship-Name 

Length j 

Beam 

Draft 

• ♦ • 


SHIP-PORT 


Ship-Name* 


Port-Name* 


PORT 


Port-Name 

Country 

Location 

Depth 



Figure 6-3. Representing a Many-To-Many Relationship 


B. DIRECTED GRAPHS 

Binary relationships can be combined into a variety of 
larger structures. It is often useful to represent these 
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structures graphically. A graph can be thought of as a set of 
nodes (or vertices) and arrows (or arcs) between the nodes. 
Each arc is specified by a pair of nodes. If the pairs of 
nodes that make up the arcs are ordered pairs, the graph is 
said to be a directed graph. (Tenebaum, Langsam, and 
Augenstein, 1990, p. 503) Since a binary relationship always 
identifies a parent and child, it can always be represented as 
a directed graph. 

If the primary key of a relation is included as a foreign 
key in a second relation, the first relation will be a parent 
to the second. Thus, identifying the foreign keys in a table 
will identify the table's parent relations. This, in effect, 
orders pairs of relations and provides a means of building a 
directed graph. Figure 6-4 shows a directed graph 
representing the relationships between the ITEM, CRATE, SHIP 
and PORT relations. 



Figure 6-4. Directed Graph of Binary Relationships 


Each relation represents a node in the graph. Links between 
relations are represented as arrows between nodes. 
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C. CONCEPT HIERARCHIES 

Each relation in the directed graph will have an 
associated concept. If these concepts are substituted at each 
node of the graph (Figure 6-5) we establish a concept 
hierarchy representing the relationships between concepts. 
The term hierarchy is used here rather 1oosely to convey the 
ordered parent-child structure of the graph. Intersection 
relations used to represent M:N relationships will not 
normally have a unique concept, so for the time being, we 
simpiy label the intersect nodes. To complete the hierarchy 
we include a "dummy" node or universal concept (*) at the root 
of the graph. All other nodes are descended from this root. 



Figure 6-5. Concept Hierarchy 


Since the concept hierarchy is based on the links between 
relations, it reflects the semantic intent of the database 
design. Thus, the graph can be used to ensure database 
queries are semantically consistent with the design of the 
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database. Expressions are evaluated by first determining the 

concept associated with each of the operands. If the concepts 

differ, we then begin at the lowest occurrence of each concept 

on the concept graph and search upward until a common node is 

reached. If the lowest common concept is (*), the expression 

is inconsistent. Otherwise, the operands share a common 

concept through inheritance and the expression is consistent 

with the semantic design of the database. 

< 

For example, suppose the crates described above can be 
shipped by either air or sea. Like a ship, an airplane can 
carry more than one crate (1:N between AIRCRAFT and CRATE) and 
can land at more than one city (M:N between AIRCRAFT and 
PORT). The AIRPORT relation represents the intersection of 
AIRCRAFT and PORT. The concept hierarchy is shown in Figure 
6 - 6 . 



if we wanted to know which crates were too wide to load on 
a specific aircraft, the query might look like: 
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SELECT Crate-ID 

FROM Crate, Aircraft 
WHERE Aircraft-ID = "DC-10" ’ AND 
Crate.Width < Aircraft.Width; 

The query involves a compound search condition containing two 
predicates. The first, Aircraft-ID = "DC-10" does not involve 
numeric-valued attributes. The second, Crate.Width < 
Aircraft.Width, does involves numeric attributes with 
different concepts: 

Crate.width —> Concept = Cargo 

Aircraft.Width —> Concept = Aircraft 

The concept hierarchy in Figure 6-6 shows the two concepts 
share a common descendant, in this case "aircraft." Thus, the 
expression "Crate.width < Aircraft.Width" is consistent with 
the semantic intent of the database design and can be allowed. 
The expression must now be checked for dimensional consistency 
and, if required, commensurate units of measure must be 
reconci1ed. 

Now suppose we wanted know which ships were wider than a 
DC-10. 

SELECT Ship-Name 

FROM Ship, Aircraft 

WHERE Aircraft-!D = "DC-10" AND 

Ship.beam > Aircraft.Width; 
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Again the query has a compound search condition and the second 
predicate involves operands having different concepts. 

Ship.width —> Concept = ship 
Aircraft.width —> Concept = aircraft 

A search of the concept graph returns (*) as the only common 
concept. Since there is no traceable relationship between 
ship and aircraft, the expression "Ship.beam > Aircraft.width" 
has no meaning in the context of the database design. 

Many-to-many relationships are handled differently. The 
intersection relation used to represent these relationships is 
fundamentally different from other child relations. The 
intersection wi11 not normally have meaning independent of its 
parent reTations and thus will not have a concept.. 
Technically, the intersection inherits the concept of each of 
the parents. 

Until this problem of multiple inheritance is resolved, we 
assign a place holding concept, "(intersect)’’, to these 
relations in order to complete the concept hierarchy. We then 
allow the (intersect) concept to reverse the direction of its 
associated arcs, as shown in Figure 6-7. This has the effect 
of placing the intersection &bove its parent nodes on the 
concept graph. Our search algorithm will then recognize the 
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relationship as valid and allow queries across both parent 
relations. 



We have indicated how a concept hierarchy might be used to 
ensure that queries are consistent with a specific database 
design. (The hierarchy also might be used to determine a 
concept of the relation returned by a query.) By identifying 
the concepts associated with each of the attributes in the 
select-list, we can search the hierarchy for the lowest common 
concept and assign it to the resulting relation. When more 
than one common concept is found the user will be required to 
reconcile the ambiguity. 

The idea of concept complements the notions of quantity 
and unit to enforce a higher level of semantic consistency 
than is currently available in commercial relational database 
systems. However, the use of concept heirarchy in this 
chapter differs from the original notion introduced by Bradley 
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and Clemence. The ideas presented in this section are 
preliminary in nature. More work needs to be done 
investigating the *~arnifications of this approach as indicated 
in the concluding section. 
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VII. CONCLUSI ON 


This thesis has presented a simple and efficient method of 
implementing a semantic type checking system for use with 
relational databases. By defining a concept, quantity and 
unit of measure for each numeric-valued data attribute, we 
increase the integrity of data manipulation. We can ensure 
queries are consistent with the design of the database. Only 
a human can determine the significance and meaning of data. 
We have shown, however, that the database system can share in 
the responsibility of handling the data in a manner consistent 
with its assigned meaning. 

Manipulation of numeric data must conform to the laws of 
dimensional consistency. This is well known in the scientific 
and technical communities which deal with mostly numeric data 
measured across a variety of systems of measurement. There 
are obvious benefits to be gained by automating the 
dimensional analysis and conversion of commensurate units in 
the databases which support these communities. 

Our research provides benefits outside the scientific 
community as well. As businesses gain greater access to 
international databases, the need for an automatic conversion 
of financial data becomes obvious. For example, including the 


64 





semantics of the data in the data dictionary would facilitate 
automatic tracking of currency fluctuations. This guarantees 
an accurate reflection of current market values. 

There are also benefits to be gained by applying concept 
hierarchies to non-numeric data. A concept represents a real 
world object or event. Every table in a relational database 
has an associated concept regardless of the data types of its 
attributes. By identifying the foreign keys in each table we 
can build a concept hierarchy for any database. The hierarchy 
is guaranteed to reflect the semantic intent of the database 
design.. For queries to be meaningful within the context of 
the design, they must be consistent with the concept 
hierarchy. 

Our proposed system is not without its limitations. It 
has no provision for conversion or coercion between 
quantities. Thus, a height cannot be compared to a length or 
a width, even when this may make sense in a query. This 
problem could be addressed by defining relationships between 
quantities similar to those linking concepts. A quantity 
hierarchy could then be established and used to coerce related 
quantities. Determining whether such a scheme would provide 
sufficient benefits is an area for future research. 

We have also largely ignored the problem of inheritance. 
We have assumed that a relation’s attributes inherit the 





concept associated with the relation. Foreign keys then have 
more than one concept. This presents no problem for semantic 
evaluation of SQL expressions, however. Concept consistency 
depends only on the relationships between database tables, not 
their attributes. 

The relationships between concepts and quantities require 
further research. If a concept is assigned to a relation, 
must the relation contain all the attributes associated with 
the concept? Which concept and quantities should be applied 
to the relation produced by joining conceptually different 
re 1 at ions? 

Relational databases provide a powerful and easily 
understood data access capability. Some might argue that 
enforcing semantic consistency restricts and limits this 
capability. As databases grow larger and more integrated, 
however, unrestricted data manipulation increases the 
likelihood of improper handling of data and misinterpretation 
of the query results. Only by enforcing the semantic intent 
of the database design can we guarantee consistent and 
meaningful interpretations. The results of this thesis 
demonstrate a technique for enforcing semantic consistency 
which extends features currently existing in relational 
database systems. 
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APPENDIX A: SAMPLE DATA DICTIONARY TABLES 


Schema: 

CONCEPTS Dictionary Table 


TABLE_NAME 

CONCEPT 


CHAR(20) 
CHAR(20) 


QUANTITIES Dictionary Table 


TABLE_NAME 

COLUMN_NAME 

QUANTITY 

Q1 

Q2 

UN I TS 


CHAR(20) 
CHAR(20) 
CHAR(20) 
NUMBER(4) 
NUMBER(4) 
CHAR(20) 


UNITS Dictionary Table 


UNIT 

DIMENSION 
D1 
D2 

CONVERSION_STRING 
Sample Dictionary Tables: 
CONCEPTS: 

TABLE NAME 


CHAR(20) 
CHAR(20) 
NUMBER(4) 
NUMBER(4) 
CHAR(20) 


CONCEPT 


US_ROUTES 
INTERNATL_ROUTES 
SHIP 
CRATE - 
ITEM 
PORT 

SHIP_PORT 

AIRCRAFT 

AIRPORT 


A IR_ROUTE 

A IR_ROUTE 

SHIP 

CARGO 

ITEM 

CITY 

(INTERSECT) 
AIRCRAFT 
(INTERSECT) 







QUANTITIES: 


TABLE_NAME 

COLUMN_NAME 

QUANTITY 

Q1 

Q2 

UNITS 


US_ROUTES 

MILEAGE 

DISTANCE 

13 

1 

MILES 

i 

1NTERNATL_ROUTES 

DISTANCE 

DISTANCE 

13 

1 

KILOMETERS 



SHIP 

LENGTH 

LENGTH 


2 1 

FT 

SHIP 

BEAM 

WIDTH 


3 1 

FT 

SHIP 

DRAFT 

DEPTH 


11 1 

FT 

SHIP 

HOLD_LENGTH 

LENGTH 


2 1 

FT 

SHIP 

HOLD_W1DTH 

WIDTH 


3 1 

FT 

SHIP 

HOLD_HE1GHT 

HEIGHT 


5 1 

FT 

SHIP 

HATCH_AREA 

AREA 


6 1 

SQFT 

SHIP 

CARGO_CAPAC1TY WEIGHT 

7 1 

TONS 

CRATE 

LENGTH 

LENGTH 


2 1 

METERS 

CRATE 

WIDTH 

WIDTH 


3 1 

METERS 

CRATE 

HEIGHT 

HEIGHT 


5 1 

METERS 

CRATE 

WEIGHT 

WEIGHT 


7 1 

K1LOGRAMS 

PORT 

DEPTH 

DEPTH 


11 1 

FT 

UN 1TS: 






UNIT 

DIMENSION 

D1 

D2 

CONVERS1ON_STR1NG 

UNITLESS 

(NULL) 

1 

1 

(Base 

Unit) 

USDOLLAR 

CURRENCY 

2 

1 

(Base 

Unit) 

METERS 

LENGTH 

3 

•t 

1 

(Base 

Unit) 

K1LOGRAMS 

MASS 

5 

1 

(Base 

Unit) 

SECONDS 

TIME 

7 

1 

(Base 

Unit) 

AMPS 

ELEC_CURRENT 

1 1 

1 

(Base 

Unit) 

DEG_K 

TEMPERATURE 

13 

1 

(Base 

Unit) 

CANDELA 

LUM1NOUS_lNTENS1TY 

17 

1 

(Base 

Unit) 

MOLE 

AMT_SUBSTANCE 

19 

1 

(Base 

Unit) 

FT 

LENGTH 

3 

1 

0.3048* 

SQFT 

LENGHT*LENGTH 

9 

1 

0.09290304* 

TONS 

MASS 

5 

1 

(.45359/2000)* 

MILES 

LENGTH 

3 

1 

1.609344* 

CU_METERS 

length*length*lengHt 






27 

1 

(Base Unit) 
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APPENDIX B: POSTFIX ALGORITHMS 

The algorithms presented here represent the major 
functions of the preprocessor. The routines needed to access 
the data dictionary tables, store the results of queries, and 
the search routines have been omitted. 

Initial implementation of the preprocessor was concerned 
only with evaluating SQL SELECT statements. Entries to the 
dictionary tables were made manually. 

The get_token() function returns the next token from the 
SQL command. The preprocessor recognizes the following token 
types: DELIMITER, OPERATOR, BOOLEAN_OP, IDENTIFIER, LITERAL, 
NUMBER, and COMMAND. OPERATOR includes both the arithmetic 
and relational operators. BOOLEAN_OP is either AND, OR, or 
NOT. A LITERAL is a string enclosed by single quotes. NUMBER 
is for numbers and COMMAND is assigned when an SQL keyword is 
encountered. 

IDENTIFIER and NUMBER represent complex structures 
consisting of : TOKEN_STRING, TABLE_NAME, COLUMN_NAME, CONCEPT, 
Q1, Q2, and CONVERS ION_STR I NG. For NUMBER tokens only the 
TOKEN_STR I NG, Q1 and Q2 fields are used. Q1 and Q2 are 
assigned the value -1. This flags the token for special 
treatment during evaluation. 
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CONVERTING INFIX EXPRESSIONS TO POSTFIX 

The following: algorithm converts an expression from infix 
to postfix notation. The algorithm requires a function 

t 

prcd(op1,op2) where op 1 and op2 are OPERATOR tokens. This 

function returns TRUE if opt has precedence over op2 when opt 

appears to the left of op2 in an infix expression. Precedence 

of operators is shown below: 

Highest * , / 

+ > “ 

Lowest =,<>,<,>,<=,>= 

For parentheses, prcd() returns the following results: 

prcd( '(', op ) = FALSE 
prcd( op, '(’ ) = FALSE 

prcd( op, ')' ) = TRUE « 

prcd( ')', op ) = undefined (an attempt to compare the 

two indicates an error) 

t 

The algorithm is shown here: 

while (not end of expression) { 
get_jtokenC); 

if (token is an identifier) 

add token to postfix string 
else { /* token is an operator */ 

while (operator stack is not empty) and 

Gtop operator in stack has precedence over 
token) { 

pop top operator from stack and add to 
postfix string 
} /* end while */ 

if (stack is empty OR token != ’)’) 
push token on to operator stack 
else /* top operator is open parenthesis */ 

pop top operator from stack and discard; v 

} /* end else */ 

} /* end while*/ 
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while (operator stack not empty) { 

pop the stack, add to postfix string 
}■ /* end while */ 


EVALUATING THE POSTFIX EXPRESS I ON 

Each operator in a postfix string refers to the previous 

two operands in the string. Each time we read an operand 

(i . e . , I DENT IFIER or NUMBER) we push it onto a stack. When we 

reach an OPERATOR, its operands will be the top two elements 

of the stack. We are concerned only with the Q1 and Q2 

elements of each operand. The following algorithm evaluates 

an expression in postfix using this method: 

/* scan the postfix string reading one 
token at a time */ 
while (not end of postfix string) { 
get_token() from postfix string 
if (token is an IDENTIFIER or NUMBER) 
push token onto stack 

else { 

/* token is an operator */ 
pop stack and assign to opnd2 
pop stack and assign to opndl 
switch token { 
case '*’: 

/* mulitply Ql’s and Q2’s */ 

result . Q1 = absva lue((opndl.Q1)*(cpnd2. Qt)) 

resu1 t. Q2 = absvalue((opndl.Q2)*(opnd2.Q2)) 

push result onto stack 

break 

case '/': 

/* cross multiply Ql’s and Q2's */ 
result . Q-T = absva lue( (opnd2. Q2)*(opnd 1. Q1) ) 
result.Q2 = absva1ue((opnd2.Q1)*(opnd1 . Q2)) 
push result onto stack 
break 

case '+': 
case '-': 
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/* 

Ql’s and Q2's must be 

equal 

*/ 

/* 

if one operand is a NUMBER 

*/ 

/* 

it takes the quantity 

of 

*/ 

/* 

the other operand 


*/ 

if 

(opndl.Q1 = -1 ) 




result. Q1 = opnd2.Ql 
result .Q2 = opnd2.G>2 


push result; 
if (opnd2.Q1 = -1) 

result.QI = opndl.QI 
result,Q2 = opndl.Q2 
push result; 

if (opndl.QI = opnd2.Q1 and 
opnd1.Q2 = opnd2.Q2) 

/* Operation allowed */ 
push opndl 
else { 

/* Quantity Inconsistency */ 
expression is inconsistency 
end evaluation 
} /* end else */ 

case 1 =’: 
case ': 
case ' < ' : 
case '>’: 
case ': 
case ’> = ’ : 

/* Ql's and Q2's must be equal */ 
/* if one operand is a NUMBER */ 
/* it takes the quantity of */ 

/* the other operand and the */ 

/* expression is allowed */ 

if (opndl.QI = -1 or 
opnd2.Q1 = -1 or 
opndl.QI = opnd2.Q1 and 
opndl.Q2 = opnd2.Q2) 

/* Operation allowed */ 
expression is consistent 
end evaluation 
else { 

/* Quantity Inconsistency */ 
expression is inconsistent 
end evaluation 
} /* end else */ 

} /* end while */ 
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